feat(memory): host-managed memory lifecycle — two-lane retrieval + after-turn record (mem0 flow)#5326
Conversation
…ter-turn record (mem0 flow) Implements the mem0 host-managed memory flow on top of nearai#5205, with the surface area confined to the native memory provider and run-level orchestration. Retrieval (once per run): - The loop fetches long-term (user-general) and short-term (per-thread, `threads/<thread_id>/`) memory once at the first prompt build of a run and injects both into the prompt's "memory" section, replacing the dead `memory_snippets: Vec::new()` in loop_support. A per-run OnceCell caches the fetch; subsequent model steps reuse it. - Native `retrieve_context` scopes by `invocation.scope.thread_id`: Some(T) → only that thread's `threads/<T>/` subtree (short-term); None → the user's general memory, excluding `threads/*` (long-term). The lanes are disjoint, so the host concatenates them (short-term first) under the existing 4 KiB admission budget. - Graceful degradation throughout: a memory failure degrades the lane to empty and never breaks a turn. Recording (after each turn): - New low-level `MemoryService::record_interaction(invocation, { messages, run_id, metadata })` — the mem0 `add` data shape (`user_id`/`agent_id`/ `thread_id` ride the invocation scope). A default no-op trait impl lets each provider opt in: the host passes the DATA and the provider decides what to do with it (store verbatim, run LLM extraction, or nothing). Implements the reserved `memory.interaction.record.v1` vocabulary. - The native provider stores the full turn history under `threads/<thread_id>/`. - A host `AfterTurnMemoryRecorder` fires at the run-end seam (`turn_run_executor::apply_exit`, gated on `Completed`), reads the exchange from the thread transcript with the owner-rewritten scope, and hands it down. Post-terminal and best-effort: every error is `debug!`-logged and never fails the already-completed run. Reads and writes resolve on the local-dev runtime path; the production graph wires `None` (deferred, issue nearai#5013 — the same optionality as `user_profile_source`). Tested at unit + caller level across ironclaw_memory{,_native}, host_runtime, loop_support, turns, and reborn (two-lane fetch, prompt rendering, once-per-run cache, native record→retrieve, and a full-turn record through the executor). A full-composition e2e (record in one run → surface in a later run's model request) is called out as a follow-up. Design: docs/superpowers/specs/2026-06-25-reborn-memory-host-lifecycle-design.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (2)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements a host-managed memory lifecycle (mem0 flow) for the Reborn planned loop, introducing proactive memory retrieval across short-term and long-term lanes, as well as after-turn interaction recording. The feedback is highly constructive and identifies several key areas for optimization and robustness: concurrently fetching memory lanes to reduce latency, preserving distributed tracing correlation IDs, wrapping background interaction recording in a timeout to prevent blocking worker threads, filtering out empty messages before storage, and ensuring failed memory fetches are cached as empty to avoid repeated timeout penalties on subsequent iterations.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| let cached = self | ||
| .memory_snippets_cache | ||
| .get_or_try_init(|| async { | ||
| let Some(request) = self.build_memory_prompt_context_request(context_messages) | ||
| else { | ||
| return Ok(Vec::new()); | ||
| }; | ||
| service.load_memory_snippets(request).await | ||
| }) | ||
| .await; | ||
| match cached { | ||
| Ok(snippets) => snippets.clone(), | ||
| // A retrieval failure must never break the turn: degrade to empty. | ||
| // The cell stays uninitialized, so a later iteration may retry. | ||
| Err(error) => { | ||
| tracing::debug!( | ||
| kind = ?error.kind, | ||
| "memory context fetch failed; degrading to empty memory for this run" | ||
| ); | ||
| Vec::new() | ||
| } | ||
| } |
There was a problem hiding this comment.
Using get_or_try_init leaves the OnceCell uninitialized if the memory service fails. Consequently, every subsequent iteration (model step) in the same run will attempt to fetch memory again, repeatedly hitting timeouts/failures and severely degrading performance. Using get_or_init and caching an empty vector on failure ensures we only attempt to fetch memory exactly once per run, preventing repeated latency penalties.
let cached = self
.memory_snippets_cache
.get_or_init(|| async {
let Some(request) = self.build_memory_prompt_context_request(context_messages)
else {
return Vec::new();
};
match service.load_memory_snippets(request).await {
Ok(snippets) => snippets,
Err(error) => {
tracing::debug!(
kind = ?error.kind,
"memory context fetch failed; degrading to empty memory for this run"
);
Vec::new()
}
}
})
.await;
cached.clone()| let long_term_invocation = MemoryInvocation { | ||
| scope: short_term_invocation.scope.without_thread_and_mission(), | ||
| correlation_id: CorrelationId::new(), | ||
| }; |
There was a problem hiding this comment.
Reusing the same correlation_id is important for distributed tracing and log correlation. Creating a new CorrelationId for the long-term lane breaks the correlation of the two lanes under the same request.
| let long_term_invocation = MemoryInvocation { | |
| scope: short_term_invocation.scope.without_thread_and_mission(), | |
| correlation_id: CorrelationId::new(), | |
| }; | |
| let long_term_invocation = MemoryInvocation { | |
| scope: short_term_invocation.scope.without_thread_and_mission(), | |
| correlation_id: short_term_invocation.correlation_id.clone(), | |
| }; |
| let mut combined = self | ||
| .retrieve_lane( | ||
| short_term_invocation, | ||
| request.query.clone(), | ||
| request.max_snippets, | ||
| context_profile_id.clone(), | ||
| MemoryLane::ShortTerm, | ||
| ) | ||
| .await | ||
| .map_err(map_memory_service_error)?; | ||
| .await; | ||
| combined.extend( | ||
| self.retrieve_lane( | ||
| long_term_invocation, | ||
| request.query, | ||
| request.max_snippets, | ||
| context_profile_id, | ||
| MemoryLane::LongTerm, | ||
| ) | ||
| .await, | ||
| ); |
There was a problem hiding this comment.
The short-term and long-term lane retrievals are independent I/O-bound operations. Running them sequentially adds unnecessary latency to the prompt generation path. We can run them concurrently using tokio::join! to significantly reduce latency.
let (mut combined, long_term) = tokio::join!(
self.retrieve_lane(
short_term_invocation,
request.query.clone(),
request.max_snippets,
context_profile_id.clone(),
MemoryLane::ShortTerm,
),
self.retrieve_lane(
long_term_invocation,
request.query,
request.max_snippets,
context_profile_id,
MemoryLane::LongTerm,
)
);
combined.extend(long_term);| let user_content = history | ||
| .messages | ||
| .iter() | ||
| .find(|message| { | ||
| message.kind == MessageKind::User && message.turn_run_id.as_deref() == Some(run_id) | ||
| }) | ||
| .and_then(|message| message.content.as_deref())?; | ||
| let assistant_content = history | ||
| .messages | ||
| .iter() | ||
| .find(|message| { | ||
| message.kind == MessageKind::Assistant | ||
| && message.status == MessageStatus::Finalized | ||
| && message.turn_run_id.as_deref() == Some(run_id) | ||
| }) | ||
| .and_then(|message| message.content.as_deref())?; |
There was a problem hiding this comment.
If the user or assistant message content is empty or contains only whitespace, recording it wastes storage and pollutes the memory index. Filtering out empty/whitespace-only content ensures only meaningful interactions are recorded.
let user_content = history
.messages
.iter()
.find(|message| {
message.kind == MessageKind::User && message.turn_run_id.as_deref() == Some(run_id)
})
.and_then(|message| message.content.as_deref())
.filter(|content| !content.trim().is_empty())?;
let assistant_content = history
.messages
.iter()
.find(|message| {
message.kind == MessageKind::Assistant
&& message.status == MessageStatus::Finalized
&& message.turn_run_id.as_deref() == Some(run_id)
})
.and_then(|message| message.content.as_deref())
.filter(|content| !content.trim().is_empty())?;References
- When canonicalizing a string, perform cheaper validation checks (like length, emptiness, and character set) on a trimmed slice before performing more expensive operations like
replacethat allocate a new string.
| if let Err(error) = self | ||
| .memory_writer | ||
| .record_interaction(invocation, request) | ||
| .await | ||
| { | ||
| debug!(error = %error, "after-turn memory: record_interaction failed; run already complete"); | ||
| } |
There was a problem hiding this comment.
Since record_interaction is a best-effort background task, awaiting it directly in the worker's execution path without a timeout can block the worker indefinitely if the memory provider hangs or is extremely slow. Wrapping the call in a timeout ensures the worker remains responsive.
let record_result = tokio::time::timeout(
std::time::Duration::from_secs(5),
self.memory_writer.record_interaction(invocation, request),
)
.await;
match record_result {
Ok(Err(error)) => {
debug!(error = %error, "after-turn memory: record_interaction failed; run already complete");
}
Err(_) => {
debug!("after-turn memory: record_interaction timed out; run already complete");
}
_ => {}
}|
Superseded by #5327 — reopened as a same-repo PR on |
Summary
Implements the mem0 host-managed memory flow on top of #5205 (
reborn/memory-lift-followups), confined to the native memory provider + run-level orchestration. Memory now reaches the model proactively (the loop previously hardcodedmemory_snippets: Vec::new()), and each completed turn is recorded back so the short-term lane is populated.What changed
Retrieval (once per run)
retrieve_contextscopes byinvocation.scope.thread_id:Some(T)→ only that thread'sthreads/<T>/subtree (short-term);None→ the user's general memory, excludingthreads/*(long-term). Disjoint lanes → the host concatenates them (short-term first) under the existing 512 B/snippet + 4 KiB-aggregate admission.ProductionMemoryPromptContextService::load_memory_snippetsfetches both lanes once (long-term viaResourceScope::without_thread_and_mission());ThreadBackedLoopContextPortcaches the result in a per-runOnceCelland surfaces it into the"memory"prompt section on every model step.loop_driver_hostfactory → port, mirroringuser_profile_source. Graceful degradation everywhere — memory never breaks a turn.Recording (after each turn) — "provider decides"
MemoryService::record_interaction(invocation, { messages, run_id, metadata })— the mem0adddata shape (user_id/agent_id/thread_idride the invocation scope). A default no-op trait impl lets each provider opt in: the host passes the DATA and the provider decides what to do with it (store verbatim, run LLM extraction, or nothing). Implements the reservedmemory.interaction.record.v1vocabulary.threads/<thread_id>/log.md.AfterTurnMemoryRecorderfires atturn_run_executor::apply_exit(gated onCompleted), reads the exchange from the thread transcript with the owner-rewritten scope, and hands it down. Post-terminal, best-effort — every error isdebug!-logged and never fails the already-completed run.Testing
Unit + caller-level across
ironclaw_memory{,_native},host_runtime,loop_support,turns,reborn:memory_snippets; once-per-run cache (load_loop_contextfetch count stays 1); nativerecord_interaction→retrieve_contextsurfaces it; and a full-turn record through the executor (turn_runner_worker_records_after_turn_memory_on_completed_run).cargo clippyclean on the touched crates;ironclaw_reborn_clibuilds. (3sandbox_processtests fail locally only without Docker; 1 pre-existingunused import: Pathwarning lives in untouchedreborn_cli.)Known items / follow-ups
None— reads + writes resolve on the local-dev runtime path only; the production graph degrades to no memory (issue Reborn: wire production-graph composition for optional context sources (identity + profile) #5013, same asuser_profile_source).max_snippets— a scratch-heavy thread can starve the long-term lane (per-lane sub-budgets are a later refinement).on_run_enddurable summary) intentionally deferred.Design doc:
docs/superpowers/specs/2026-06-25-reborn-memory-host-lifecycle-design.md🤖 Generated with Claude Code